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Abstract — We propose consensus propagation, an asynchronous 
distributed protocol for averaging numbers across a network. 
We establish convergence, characterize the convergence rate for 
regular graphs, and demonstrate that the protocol exhibits better 
scaling properties than pairwise averaging, an alternative that has 
received much recent attention. Consensus propagation can be 
viewed as a special case of belief propagation, and our results 
contribute to the belief propagation literature. In particular, 
beyond singly-connected graphs, there are very few classes of 
relevant problems for which belief propagation is known to 
converge. 

Index Terms — belief propagation, distributed averaging, dis- 
tributed consensus, distributed signal processing, Gaussian 
Markov random fields, message-passing algorithms, max-product 
algorithm, min-sum algorithm, sum-product algorithm. 



I. Introduction 

CONSIDER a network of n nodes in which the ith 
node observes a real number € K and aims to 
compute the average y = '^i^iUi/n. The design of scal- 
able distributed protocols for this purpose has received much 
recent attention and is motivated by a variety of potential 
needs. In both wireless sensor and peer-to-peer networks, for 
example, there is interest in simple protocols for computing 
aggregate statistics (see, e.g. [1], [2], [3], [4], [5], [6], [7]), 
and averaging enables computation of several important ones. 
Further, averaging serves as a primitive in the design of more 
sophisticated distributed information processing algorithms. 
For example, a maximum likelihood estimate can be produced 
by an averaging protocol if each node's observations are linear 
in variables of interest and noise is Gaussian [8]. [9] considers 
an averaging problem with applications to load balancing 
and clock synchronization. As another example, averaging 
protocols are central to policy-gradient-based methods for 
distributed optimization of network performance [10]. 

In this paper we propose and analyze a new protocol 
- consensus propagation - for distributed averaging. The 
protocol can operate asynchronously and requires only simple 
iterative computations at individual nodes and communication 
of parsimonious messages between neighbors. There is no 
central hub that aggregates information. Each node only needs 
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to be aware of its neighbors - no further information about the 
network topology is required. There is no need for construction 
of a specially-structured overlay network such as a spanning 
tree. It is worth discussing two previously proposed and well- 
studied protocols that also exhibit these features: 

1) (probabilistic counting) This protocol is based on ideas 
from [11] for counting distinct elements of a database 
and in [12] was adapted to produce a protocol for 
averaging. The outcome is random, with variance that 
becomes arbitrarily small as the number of nodes grows. 
However, for moderate numbers of nodes, say tens of 
thousands, high variance makes the protocol impractical. 
The protocol can be repeated in parallel and results 
combined in order to reduce variance, but this leads 
to onerous memory and communication requirements. 
Convergence time of the protocol is analyzed in [13]. 

2) (pairwise averaging) In this protocol, each node main- 
tains its current estimate of the average, and each time 
a pair of nodes communicate, they revise their estimates 
to both take on the mean of their previous estimates. 
Convergence of this protocol in a very general model 
of asynchronous computation and communication was 
established in [14], and there has been significant follow- 
on work, a recent sample of which is [15]. Recent 
work [16], [17] has studied the convergence rate and 
its dependence on network topology and how pairs of 
nodes are sampled. Here, sampling is govemed by a 
certain doubly stochastic matrix, and the convergence 
rate is characterized by its second-largest eigenvalue. 

In terms of convergence rate, probabilistic counting dom- 
inates both pairwise averaging and consensus propagation 
in the asymptotic regime. However, consensus propagation 
and pairwise averaging are hkely to be more effective in 
moderately-sized networks (up to hundreds of thousands or 
perhaps even milhons of nodes). Further, these two protocols 
are both naturally studied as iterative matrix algorithms. As 
such, pairwise averaging will serve as a baseline to which we 
will compare consensus propagation. 

Consensus propagation is a simple algorithm with an intu- 
itive interpretation. It can also be viewed as an asynchronous 
distributed version of behef propagation as applied to approx- 
imation of conditional distributions in a Gaussian Markov ran- 
dom field. When the network of interest is singly-cormected, 
prior results about belief propagation imply convergence of 
consensus propagation. However, in most cases of interest, 
the network is not singly-connected and prior results have 
little to say about convergence. In particular, Gaussian belief 
propagation on a graph with cycles is not guaranteed to 
converge, as demonstrated by numerical examples in [18]. 
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In fact, there are very few relevant cases where belief 
propagation on a graph with cycles is known to converge. 
Some fairly general sufficient conditions have been established 
[19], [20], [21], [22], but these conditions are abstract and it 
is difficult to identify interesting classes of problems that meet 
them. One simple case where belief propagation is guaranteed 
to converge is when the graph has only a single cycle and 
variables have finite support [23], [24], [25]. In its use for 
decoding low-density parity-check codes, though convergence 
guarantees have not been made, [26] establishes desirable 
properties of iterates, which hold with high probability. Re- 
cent work proposes the use of belief propagation to solve 
maximum-weight matching problems and proves convergence 
in that context [27]. In the Gaussian case, [18], [28] provide 
sufficient conditions for convergence, but these conditions 
are difficult to interpret and do not capture situations that 
correspond to consensus propagation. Since this paper was 
submitted for publication, a general class of results has been 
developed for the convergence of Gaussian belief propagation 
[29], [30]. These results can be viewed as a generalization of 
the convergence results in this paper. However, they do not 
address the issue of rate of convergence. 

With this background, let us discuss the primary contribu- 
tions of this paper: 

1) We propose consensus propagation, a new asynchronous 
distributed protocol for averaging. 

2) We prove that consensus propagation converges even 
when executed asynchronously. Since there are so few 
classes of relevant problems for which belief propa- 
gation is known to converge, even with synchronous 
execution, this is surprising. 

3) We characterize the convergence time in regular graphs 
of the synchronous version of consensus propagation in 
terms of the the mixing time of a certain Markov chain 
over edges of the graph. 

4) We explain why the convergence time of consensus 
propagation scales more gracefully with the number of 
nodes than does that of pairwise averaging, and for 
certain classes of graphs, we quantify the improvement. 

It is worth mentioning a recent and related line of research 
on the use of belief propagation as an asynchronous distributed 
protocol to arrive at consensus among nodes in a network, 
when each node makes a conditionally independent observa- 
tion of the class of an object and would like to know the 
most probable class based on all observations [31]. The authors 
establish that belief propagation converges and provides each 
node with the most probable class when the network is a tree 
or a regular graph. They further show that for a certain class 
of random graphs, the result holds in an asymptotic sense as 
the number of nodes grows. To deal with general connected 
graphs, the authors offer a more complex protocol with conver- 
gence guarantees. It is interesting to note that this classification 
problem can be reduced to one of averaging. In particular, if 
each node starts out with the conditional probability of each 
class given its own observation and the network carries out 
a protocol to compute the average log-probability for each 
class, each node obtains the conditional probabilities given all 



observations. Hence, consensus propagation also solves this 
classification problem. 

II. Algorithm 

Consider a connected undirected graph (F, E) with V = 
{!,... ,n}. For each node i e V, let N{i) = {j \ e E} 
be the set of neighbors of i. Let _E C x F be a set consisting 
of two directed edges {i,j} and per undirected edge 

e E. (In general, we will use braces for directed edges 
and parentheses for undirected edges.) 

Each node i E V is assigned a number y.^ e M. The goal is 
for each node to obtain an estimate of y = J^iev Vi/''^ through 
an asynchronous distributed protocol in which each node car- 
ries out simple computations and communicates parsimonious 
messages to its neighbors. 

We propose consensus propagation as an approach to the 
aforementioned problem. In this protocol, if a node i commu- 
nicates to a neighbor j at time t, it transmits a message con- 
sisting of two numerical values. Let fj,^*^ S K and K^^^ g 1R4. 
denote the values associated with the most recently transmitted 
message from i to j at or before time t. At each time t, node 
i has stored in memory the most recent message from each 
neighbor: {fJ-''^i , K^^- \ u e N{i)}. If, at time t + 1, node i 
chooses to communicate with a neighboring node j £ N{i), it 
constructs a new message that is a function of the set of most 
recent messages {iJ-uiiK^^^ \ u e N{i) \ j} received from 
neighbors other than j. The initial values in memory before 
receiving any messages are arbitrary. 

In order to illustrate how the parameter vectors /i^*^ and 
evolve, we will first describe a special case of the consen- 
sus propagation algorithm that is particularly intuitive. Then, 
we will describe the general algorithm and its relationship to 
belief propagation. 

A. Intuitive Interpretation 

Consider the special case of a singly-connected graph. That 
is, a connected graph where there are no loops present (a tree). 
Assume, for the moment, that at every point in time, every pair 
of connected nodes communicates. As illustrated in Fig.[T] for 
any edge E E, there is a set Sij C of nodes, with 

i E Sij, that can transmit information to Sji — V \ Sij, with 
j E Sji, only through In order for nodes in Sji to 

compute y, they must at least be provided with the average 
fi*j among observations at nodes in Sij and the cardinality 
Kij — \Sij\. Similarly, in order for nodes in Sij to compute 
y, they must at least be provided with the average /i*^ among 
observations at nodes in 5*^^ and the cardinality K*^ — \Sji\. 
These values must be communicated through the link {j, i}. 

The messages and K^j\ transmitted from node i to 
node j, can be viewed as iterative estimates of the quantities 
li*j and K*j. They evolve according to 



(*-i) „(*-!) 



(t) _ + J2ueN{i)\j ^ui 'M- 

l^ij - 



K 

eN{i)\j ^^ui 

u^N{i)\j 



(t-1) 



V{z,j}eS, (la) 
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it) 
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cardinality of S., 




Fig. 1. Inteipretation of messages in a singly connected graph with /3 = oo. 



At each time t, each node i computes an estimate of the global 
average y according to 



K. 



it) 



1 + J2ueN{i) 

Assume that the algorithm is initialized with if^^' = 0. 
A simple inductive argument shows that at each time t > 1, 
is the average among observations at the nodes in the set 
Sij that are at a distance less than or equal to t from node 
i. Furthermore, Kf^ is the cardinality of this collection of 
nodes. Since any node in Sij is at a distance from node i that 
it at most the diameter of the graph, if t is greater that the 
diameter of the graph, we have iiT^*) — K* and /i*^*) — fi*. 
Thus, for any i G V, and t sufficiently large, 

(t) _ + ^u(£N{i) ^uit^u 



1 + E 



y- 



So, Xj-*^ converges to the global average y. Further, this simple 
algorithm converges in as short a time as is possible, since the 
diameter of the graph is the minimum amount of time for the 
two most distance nodes to communicate. 

Now, suppose that the graph has cycles. For any directed 
edge {i,j} G E that is part of a cycle, k'i*^^ ~* oo. Hence, 
the algorithm does not converge. A heuristic fix might be to 
compose the iteration ( [Tb] i with one that attenuates: 



K, 



(t) 



uGN{i)\] 



At-1) 



K, 



it) 



K. 



it) 



1 



Here, Qij > and /3 > are positive constants. We can view 
the unattenuated algorithm as setting [3 — oo. In the atten- 
uated algorithm, the message is essentially unaffected when 
^if /iPQij) is small but becomes increasingly attenuated as 
^ij' grows. This is exactly the kind of attenuation carried 
out by consensus propagation. Understanding why this kind 
of attenuation leads to desirable results is a subject of our 
analysis. 

B. General Algorithm 

Consensus propagation is parameterized by a scalar /3 > 



and a non-negative matrix Q e 



with Qij > if and 



only if i 7^ j and E E. For each {i, j} G E, it is useful 
to define the following three functions: 



'^ueN{i)\j 



1- 



1 

/3Qi, 

-y 



1 



eN{i)\j 



-^uil^'u 



eN{i 



(2a) 

(2b) 
(2c) 



For each t, denote hy Ut ^ E the set of directed edges 
along which messages are transmitted at time t. Consensus 
propagation is presented below as Algorithm [T] 

Algorithm 1 Consensus propagation. 
1 

2: 
3: 

4: 
5: 
6: 
7: 

8: 
9: 
10: 
11: 



for time t = 1 to cx) do 
for all {ij} e Ut do 

end for 

for all {i,j} ^ Ut do 

r^it) ^ K^t-l) 

ii ij 

it) (t-i) 

end for 
end for 



Consensus propagation is a distributed protocol because 
computations at each node require only information that 
is locally available. In particular, the messages K^^ — 
Tij{K^*~^^) and p!f^ = gij{^i'^*-^\ K^^-^^) transmitted from 



u e N{i)}, 
it) 



node i to node j depend only on -^i*"^' 
which node i has stored in memory. Similarly, which 
serves as an estimate of y, depends only on K^^ \ u G 

iV(^)}. 

Consensus propagation is an asynchronous protocol because 
only a subset of the potential messages are transmitted at 
each time. Our convergence analysis can also be extended 
to accommodate more general models of asynchronism that 
involve communication delays, as those presented in [32]. 

In our study of convergence time, we will focus on the 
synchronous version of consensus propagation. This is where 
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Ut = E for all t. Note that synchronous consensus propagation 
is defined by: 

=^(X(*-i)), (3a) 
^(*) = c;(;,(*-i),i^(*-i)), (3b) 
x(*) = A'(/.(*-i), if (3c) 

C. Relation to Belief Propagation 

Consensus propagation can also be viewed as a special case 
of belief propagation. In this context, belief propagation is 
used to approximate the marginal distributions of a vector x e 
M" conditioned on the observations y E M". The mode of each 
of the marginal distributions approximates y. 

Take the prior distribution over {x, y) to be the normalized 
product of potential functions {V'j(') I * G V} and compati- 
biUty functions | G E}, given by 

ipiixi) = exp(-(xi - yi)'^), 

where Qij > 0, for each edge E E, and /3 > are 

constants. Note that f3 can be viewed as an inverse temperature 
parameter; as (3 increases, components of x associated with 
adjacent nodes become increasingly correlated. 

Let r be a positive semidefinite symmetric matrix such that 

X ^ ^ Qij ('^i '^j ) • 

Note that when Qij — 1, for all edges E E, T is 

the graph Laplacian. Given the vector y of observations, the 
conditional density of x is 

= exp{-\\x-y\\l- px'^Tx) . 

Let x^ denote the mode (maximizer) of p^{-)- Since the 
distribution is Gaussian, each component a;f is also the mode 
of the corresponding marginal distribution. Note that x^^ it is 
the unique solution to the positive definite quadratic program 

minimize \\x — y\\l^ + [ix^Tx. (4) 

X 

The following theorem relates x^ to the mean value y. 

Theorem 1: x^ /n = y and lim^joo a^f = y, for all i E 

V. 

Proof: The first order conditions for optimality imply 
{I + (3V)xf^ = y. If we set 1 = (1,...,!)^ E W\ we 
have ri ~ 0, hence 1^ x^ /n — l^y/n — y. Let U be an 
orthogonal matrix and D a diagonal matrix that form a spectral 
decomposition of F, that is F = DU. Then, we have 
x'^ = {I + PD)~^Uy. It is clear that F has eigenvalue 
with multiplicity 1 and corresponding normalized eigenvector 
l/y/ii, and all other eigenvalues d2, ■ ■ ■ ,dn of F are positive. 
Then, if D — diag(0, d2, ■ ■ ■ , dn), 

lim x^ 

= lim C/^diag(l, 1/(1 + /3d2),..., 1/(1 + /3dn))t/y 

p— S-OO 

= ll^y/n. 



The above theorem suggests that if (3 is sufficiently large, then 
each component x^ can be used as an estimate of y. 

In belief propagation, messages are passed along edges of a 
Markov random field. In our case, because of the structure of 
the distribution p^{-), the relevant Markov random field has 
the same topology as the graph {V,E). The message M^^\-) 
passed from node i to node j at time t is a distribution on the 
variable Xj. Node i computes this message using incoming 
messages from other nodes as defined by the update equation 




(5) 

Here, k is a normalizing constant. Since our underlying 
distribution p^{-) is Gaussian, it is natural to consider mes- 
sages which are Gaussian distributions. In particular, let 
{fJ-if i^if) G KxM+ parameterize Gaussian message M^^\-) 
according to 

M^)(^,)«exp ^K^\x,-^)'). 

Then, (|5| is equivalent to the synchronous consensus propa- 
gation iterations for if*^*' and /z'*-'. 
The sequence of densities 

(x,)(xv,(x,) n 

i<£N{j) 

= exp(-(x, -y,f- Y: kI^{x,-p§A, 
\ »eJVO) / 

is meant to converge to an approximation of the marginal 
conditional distribution of Xj. As such, an approximation to 
x^ is given by maximizing pf\-)- It is easy to show that, 
the maximum is attained by a;^*'' = Xj{p^*\ K^*^). With 
this and aforementioned correspondences, we have shown that 
consensus propagation is a special case of belief propagation, 
and more specifically, Gaussian belief propagation. 

Readers familiar with belief propagation will notice that in 
the derivation above we have used the sum-product form of 
the algorithm. In this case, since the underlying distribution is 
Gaussian, the max-product form yields equivalent iterations. 

D. Relation to Prior Results 

In light of the fact that consensus propagation is a special 
case of Gaussian belief propagation, it is natural to ask 
what prior results on belief propagation — Gaussian or more 
broadly — have to say in this context. Results from [28], [18], 
[33] establish that, in the absence of degeneracy, Gaussian 
belief propagation has a unique fixed point and that the mode 
of this fixed point is unbiased. The issue of convergence, 
however, is largely poorly understood. As observed numeri- 
cally in [18], Gaussian belief propagation can diverge, even 
in the absence of degeneracy. Abstract sufficient conditions 
for convergence that have been developed in [28], [18] are 
difficult to verify in the consensus propagation case. 
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III. Convergence 

As we have discussed, Gaussian belief propagation can 
diverge, even when the graph has a single cycle. One might 
expect the same from consensus propagation. However, the 
following theorem establishes convergence. 

Theorem 2: The following hold: 

(i) There exist unique vectors {^m^^K^) such that = 
T{KP) and = g{,j.P,K^). 

(ii) Suppose that each directed edge appears infinitely 
often in the sequence of communication sets {[/*}. Then, 
independent of the initial condition K'^^^) , 



lim = Kf^, and lim ^l 



it) 



(iii) Given ifi''^,K'^), if 



X{fi'^,K''^), then a;^ is the 



mode of the distribution p^{-)- 
Note that the condition on the communication sets in The- 
orem |2|ii) corresponds to total asynchronism in the language 
of [32]. This is a weak assumption which ensures only that 
every component of /i*^*-' and K'^^^ is updated infinitely often. 

The proof of this theorem is deferred until the appendix, but 
it rests on two ideas. First, notice that, according to the update 
equation ( |2a| ), if'*^ evolves independently of /i'*^. Hence, we 
analyze Ky^ first. Following the work in [18], we prove that 
the functions are monotonic. This property is used to 

establish convergence to a unique fixed point. Next, we analyze 
/Lt^*) assuming that A'*^*^ has already converged. Given fixed K, 
the update equations for /i^*) are linear, and we establish that 
they induce a contraction with respect to the maximum norm. 
This allows us to establish existence of a fixed point and both 
synchronous and asynchronous convergence. 

IV. Convergence Time for Regular Graphs 

In this section, we will study the convergence time of 
synchronous consensus propagation. For e > 0, we will say 
that an estimate i of y is e-accurate if 



2,n 



< e. 



(6) 



Here, for integer m, we set || • ||2.m to be the norm on M™ 
defined by ||x||2,m = ||a;||2/v^- We are interested in the 
number of iterations required to obtain an e-accurate estimate 
of the mean y. 

Note that we are primarily interested in how the perfor- 
mance of consensus propagation behaves over a series of 
problem instances as we scale the size of the graph. Since 
our measure of error (|6]l is absolute, we require that the set of 
values {yi} lie in some bounded set. Without loss of generality, 
we will take yi e [0, 1], for all i ^ V. 

A. The Case of Regular Graphs 

We will restrict our analysis of convergence time to cases 
where (V, E) is a d-regular graph, for d> 2. Extension of our 
analysis to broader classes of graphs remains an open issue. 
We will also make simplifying assumptions that Qij = 1, 
'^i? = yi, and K^'^^ = [kolij for some scalar /cq > 0. 

In this restricted setting, the subspace of constant K vectors 
is invariant under T. This implies that there is some scalar 



> so that Kf^ = [k%,j. This k^ is the unique solution 
to the fixed point equation 



l + (l + (d-l)fc/3)//3- 



(7) 



Given a uniform initial condition if*^"^ = [fco]ij, we can 
study the sequence of iterates {X^*^} by examining the scalar 
sequence {fct}, defined by 



l + (l + (d-l)fct_i)//3" 



(8) 



In particular, we have Xf*) = [fcf]y, for all t > 0. 

Similarly, in this setting, the equations for the evolution of 
/i^*) take the special form 



(t) ^ 

l + (d-l)fct_i 



1 



(t-i) 

^ ^ * i / ueNii)\j 



E 



Defining jt = 1/(1 + ("^ ^ 1)^*)' we have, in vector form, 

^jt-iy + {i-it-i)Pi^^'"'\ (9) 

where y e M""* is a vector with y,j = y, and P e m:^'^><"'* 
is a doubly stochastic matrix. The matrix P corresponds to a 
Markov chain on the set of directed edges E. In this chain, a 
directed edge {i,j} transitions to a directed edge {u,i} with 
u e \ j, with equal probability assigned to each such 

edge. As in (|3]l, we associate each /i'*) with an estimate a;*^*^ 
of according to 

^ ^ l + dk^y^ 1 + dkP^^ ' 
where A e M'^^""^ is a matrix defined by = 



ieN(j) 



fiij/d. 



B. The Cesaro Mixing Time 

The update equation (|9| suggests that the convergence of 
/i^*) is intimately tied to a notion of mixing time associated 
with P. Let P* be the Cesaro limit 

t-i 

= lim ^ P^/i. 
Define the Cesaro mixing time t* by 



sup 

t>0 



T=0 



2,nd 



Here, || • ||2,nd is the matrix norm induced by the corresponding 
vector norm || • ||2.nrf- Since P is a stochastic matrix, P* is 
well-defined and r* < oo. Note that, in the case where P is 
aperiodic, irreducible, and symmetric, r* corresponds to the 
traditional definition of mixing time: the inverse of the spectral 
gap of P. 



IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 11, NOVEMBER 2006 



6 



C. Bounds on the Convergence Time 

Let = limt|oo7t = 1/(1 + {d - l)kl^). With an initial 
condition fco — k^^, the update equation for fi^*^ becomes 



(t) 



(1-7^)Pm 



(t-i) 



Since G (0, 1), this iteration is a contraction mapping, 
with contraction factor 1 — ■f^. It is easy to show that 
is monotonically decreasing in /3, and as such, large values 
of P are likely to result in slower convergence. On the other 
hand. Theorem [T| suggests that large values of /3 are required 
to obtain accurate estimates of y. To balance these conflicting 
issues, P must be appropriately chosen. 

A time t* is said to be an e-convergence time if estimates 
x'*-' are e-accurate for all t > t* . The following theorem, 
whose proof is deferred until the appendix, establishes a 
bound on the e-convergence time of synchronous consensus 
propagation given appropriately chosen (3, as a function of e 
and T*. 

Theorem 3: Suppose ko < k'^ . If d = 2 there exists a (3 = 
6((r*/e)^) and if d > 2 there exists a /3 = 6(r*/e) such that 
some t* = 0((r*/e) log(T*/e)) is an e-convergence time. 

In the above theorem, fco is initialized arbitrarily so long as 
ko < fc^. Typically, one might set fco = to guarantee this. 
Another case of particular interest is when fco — k^, so that 
kt = kf^ for aU t > 0. In this case, the following theorem, 
whose proof is deferred until the appendix, offers a better 
convergence time bound than Theorem [5] 

Theorem 4: Suppose ko = k^. If c? = 2 there exists a (3 — 
e((r*/e)2) and if d > 2 there exists a /3 = 9(T*/e) such that 
some t* — 0((r*/e)log(l/e)) is an e-convergence time. 

Theorems p] and suggest that initializing with fco = k^ 
leads to an improvement in convergence time. However, in 
our computational experience, we have found that an initial 
condition of fco = consistently results in faster convergence 
than fco = k^. Hence, we suspect that a convergence time 
bound of 0{{t* /e) log(l/e)) also holds for the case of fco = 0. 
Proving this remains an open issue. 

D. Adaptive Mixing Time Search 

The choice of (3 is critical in that it determines both conver- 
gence time and ultimate accuracy. This raises the question of 
how to choose f3 for a particular graph. The choices posited 
in Theorems [3] and |4] require knowledge of t*, which may be 
both difficult to compute and also requires knowledge of the 
graph topology. This counteracts our purpose of developing a 
distributed protocol. 

In order to address this concern, consider Algorithm |2] 
which is designed for the case of d > 2. It uses a doubling 
sequence of guesses f for the Cesaro mixing time t*. Each 
guess leads to a choice of f3 and a number of iterations t* . 
Note that the algorithm takes e > as input. 

Consider applying this procedure to a d-regular graph with 
fixed d > 2 but topology otherwise unspecified. It follows 
from Theorem [3] that this procedure has an e-convergence time 
of 0((T*/e) log(r*/e)). An entirely analogous algorithm can 
be designed for the case of d = 2. 



Algorithm 2 Synchronous consensus propagation with adap- 
tive mixing time search. 

for = to oo do 

f^2' 

Set P and t* as indicated by Theorem[3] assuming r* = 
f 

5: for s = 1 to t* do 

7: t ^t+l 

8: end for 
9: end for 



We expect that many variations of this procedure can be 
made effective. Asynchronous versions would involve each 
node adapting a local estimate of the mixing time. 

V. Comparison with Pairwise Averaging 



Using the results of Section IV we can compare the perfor- 
mance of consensus propagation to that of pairwise averaging. 
Pairwise averaging is usually defined in an asynchronous 
setting, but there is a synchronous counterpart which works 
as follows. Consider a doubly stochastic symmetric matrix 



P £ 



such that Pij = if i 7^ j and ^ E. 



Evolve estimates according to x*^*' ~ Px*^*~^\ initialized with 
2;(o) = y Here, at each time t, a node i is computing a new 
estimate which is an average of the estimates at node i and 
its neighbors during the previous time-step. If the matrix P is 
aperiodic and irreducible, then x'*^ = P*y —> as i t oo. 

In the case of a singly-connected graph, synchronous con- 
sensus propagation converges exactly in a number of iterations 
equal to the diameter of the graph. Moreover, when /3 = 
oo, this convergence is to the exact mean, as discussed in 



Section II-A This is the best one can hope for under any 



algorithm, since the diameter is the minimum amount of 
time required for a message to travel between the two most 
distant nodes. On the other hand, for a fixed accuracy e, 
the worst-case number of iterations required by synchronous 
pairwise averaging on a singly-connected graph scales at least 
quadratically in the diameter [34]. 

The rate of convergence of synchronous pairwise averaging 
is governed by the relation Hx'-*-' — 2/l||2.n < where A2 is 
the second largest eigenvalu^of P. Let T2 = l/log(l/A2), 
and call it the mixing time of P. In order to guarantee e- 
accuracy (independent of y), t > T2log(l/e) suffices and t ~ 
ri(r2 log(l/e)) is required. 

Consider d-regular graphs and fix a desired error tolerance 
e. The number of iterations required by consensus propagation 
is 0(r*logr*), whereas that required by pairwise averaging 
is 0(t2). Both mixing times depend on the size and topology 
of the graph. T2 is the mixing time of a process on nodes that 
transitions along edges whereas r* is the mixing time of a 

' Here, we take the standard approach of ignoring the smallest eigenvalue of 
P. We will assume that this eigenvalue is smaller than A2 in magnitude. Note 
that a constant probability can be added to each self-loop of any particular 
matrix P so that this is true. 
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process on directed edges that transitions towards nodes. An 
important distinction is that the former process is allowed to 
"backtrack" where as the latter is not. By this we mean that 
a sequence of states can be observed in the vertex 

process, but the sequence cannot be observed 

in the edge process. As we will now illustrate through an 
example, it is this difference that makes T2 larger than r* 
and, therefore, pairwise averaging less efficient than consensus 
propagation. 

In the case of a cycle (d = 2) with an even number 
of nodes n, minimizing the mixing time over P results in 
T2 = G(n^) [35], [17], [36]. For comparison, as demonstrated 
in the following theorem (whose proof is deferred until the 
appendix), r* is linear in n. 

Theorem 5: For the cycle with n nodes, r* < n/\/2. 
Intuitively, the improvement in mixing time arises from the 
fact that the edge process moves around the cycle in a single 
direction and therefore travels distance t in order t iterations. 
The vertex process, on the other hand, is "diffusive" in 
nature. It randomly transitions back and forth among adjacent 
nodes, and requires order iterations to travel distance t. 
Non-diffusive methods have previously been suggested in the 
design of efficient algorithms for Markov chain sampling (see 
[37] and references therein). 

The cycle example demonstrates a 0(n/logn) advantage 
offered by consensus propagation. Comparisons of mixing 
times associated with other graph topologies remains an issue 
for future analysis. Let us close by speculating on a uniform 
grid of n nodes over the m-dimensional unit torus. Here, 
^i/m j^jj jjiteger, and each vertex has 2m neighbors, each 
a distance away. With P optimized, it can be shown 

that T2 = 6(n^/'") [38]. We put forth a conjecture on r*. 

Conjecture 1: For the m-dimensional torus with n nodes, 

T* = e(n(2™-i)/™'). 
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Appendix I 
Proof of Theorem|2] 

Given initial vectors (/^'^"\ i^T^^^), and a sequence of com- 
munication sets {J/i, C/2, . . .}, the consensus propagation al- 
gorithm evolves parameter values over time according to 



K, 



it) _ 



(if (*-!)), if{i,j}eUt 



K 



(t-i) 



otherwise, 



(10) 



otherwise, 

for times t > Q. 

In order to establish Theorem [2j we will first study con- 
vergence of the inverse variance parameters if and subse- 
quently the mean parameters /i'*^. 



A. Convergence of Inverse Variance Updates 

Our analysis of the convergence of the inverse variance 
parameters follows the work in [18]. We begin with a fun- 
damental lemma. 

Lemma 1: For each {i, j} G E, the following facts hold: 

(i) The function is continuous. 

(ii) The function is monotonic. That is, if if < if', 
where the inequality is interpreted component-wise, then 

^^,(if) < :f^j{k'). 

(iii) If if^ = T^J{K), then < if^ < 

(iv) If a > 1, then aTij{K) > T^j{aK). 
Proof: Define the function / : E+ ^ M+ by 



1 



7 + 



where 7 > 0. (i) follows from the fact that / is continuous, 
(ii) follows from the fact that f{x) is strictly increasing, (iii) 
follows from the fact that f{x) G (0, 1/7) for all x > 0. (iv) 
follows from the fact that af{x) > f{ax). ■ 

Now we consider the sequence of iterates {if if (1), . . .} 
which evolve according to ( [TO| i. 

Lemma 2: Let K^°^ be such that J^ij (if(o)) > if(") for all 
{i,j} G E (for example, if(o) = 0). Then if(*) converges to 
a vector if'' such that if' = J^(if^). 

Proof Convergence follows from the fact that the iterates 
are component-wise bounded and monotonic. The limit point 
must be a fixed point by continuity. ■ 

Given the existence of a single fixed point, we can establish 
that the fixed point must be unique. 

Lemma 3: The !F operator has a unique fixed point if^. 
Proof Denote to be the fixed point obtained by 
iterating with initial condition if^''^ = 0, and let if' be 
some other fixed point. It is clear that if < if', thus, by 
monotonicity, we must have if'' < if'. Define 

7 = inf {a G [1, 00) : if ' < aif"} . 

It is clear that 7 is well-defined since < {Kij, < f3Qij. 
Also, we must have 7 > 1, since if'' 7^ if'. Then, 

if,',. = J^,jiK') < T,,{^kP) < lT,,{K^) = 7<- 

This contradicts the definition of 7. Hence, there is a unique 
fixed point. ■ 
Lemma 4: Given an arbitrary initial condition if G 

lim if(*) = K'^ 

t — »oo 

Proof If < if(°) < if", the result holds by 
monotonicity. Assume that if" < if^°'. 

7 = inf |q: G [1,00) : ifW <aif''}. 

Then, 

< ^.,(if(")) < :f.,,{^kP) < iT.,{kP) = 7<- 

Define a sequence {if'*)} by 

if (0) = 7if'', 
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and, for all {i,j} £ E, t > 0, 



K, 



it) _ 



K. 



(t-i) 



Otherwise. 

,(0) 



Since < jTijiK^) = Kl"> , the sequence {X^*)} 

is monotonically decreasing and must have a limit which is a 
fixed point. Since the fixed point is unique, we have X'*) 
K^. But, R/^ < < K'^'^K By monotonicity, we also have 
KW ^ K^. _ 
Now, consider the case of general K'-°\ Define if and K 
such that K < if(°) <KsLndK<K^^<K. By the previous 
two cases and monotonicity, we again have if K^. ■ 

B. Convergence of Mean Updates 

In this section, we will consider certain properties of the 
updates for the mean parameters. Define the operator Q{-,K) 
to be the synchronous update of all components of the mean 
vector according to 

Lemma 5: There exists a € (0,1) so that 

(i) For all ^, /i' e M^, 

(ii) If t is sufficiently large, for all fi, p' E M.^, 

\\g{p,K^'^)-g{,i',K'^'^)\\^<a\\p-p'\u 



Proof: Set 



K. 



a — max 



iff 



u£N{i)\j 

Observing that a < 1, Part (i) follows. 
Define 



if. 



(t) 



at = max 



Since if — > if^, by continuity at — > a < 1. Then, Part (ii) 
follows. ■ 

Lemma Instates that g{-,KP) IS a maximum norm contrac- 
tion. This leads to the following lemma. 

Lemma 6: The following hold: 

(i) There is unique fixed point /i^ such that 

/^t;(/,if^). 

(ii) There exists Ti such if t > Ti, the operator ^(•,if(*)) 
has a unique fixed point z/'*'. That is, 

=g(,.(*),if(*)). 

(iii) For any e > 0, there exists T2 > Ti so that if i > T'2, 

Proof: For Part (i), since g{-,K^) is a maximum norm 
contraction, existence of a unique fixed point fj,f^ follows from, 
for example. Proposition 3.1.1 in [32]. Part (ii) is established 
similarly. 



For Part (iii), note for t sufficiently large, the linear system 
of equations 

iy^g{i^, if(*)) 

over v e is non-singular, by Part (ii). Since A'^*) 
if^, the coefficients of this system of equations continuously 
converge to those of 

iy = g{iy, K^). 

Then, we must have v'^*-^ fi^. ■ 

C. Overall Convergence 

We are now ready to prove Theorem [2] 

Theorem |2} Assume that the communication sets {[/*} have 
the property that every directed edge {i, j} G E appears in Ut 
for infinitely many t. The following hold: 

(i) There are unique vectors {fi^,K^) such that 

A'/3 = ^(A'/3), and ^ g{^i^ , K^). 

(ii) Independent of the initial condition (^^''^ if 



lim if(*) = if^, and lim 



(iii) Given {ijlP,KP), if = X{ijlP,K'^), then x'' is the 
mode of the distribution p^{-). 

Proof: Existence and uniqueness of the fixed point 
Kf^ and convergence of the vector if to if'^ follow from 
Lemmas |3] and pj respectively. Existence and uniqueness of 
the fixed point jjP follows from Lemma [6] 

To establish the balance of Part (ii), we need to show that 



(t) 



^. We will use a variant of the "box condition" 



argument of Proposition 6.2.1 in [32]. 

Fix any e > 0. By Lemma l6] pick T2 so that if i > T2, then 
exists with = e;(z^(*Tif(*)) and ||z^(*) - pl^Woo < £■ 
Fort>r2, if {i,j}eUt, 



\i4f 



l^-i\ < ll^l' - 1^. 



,(*-!) 



|e,,(M(*-i),if(*-i))-c;,,(^(*-i),if(*-i)) 



Xt-1) 



/lloo 



< a\\fi^*''^ - fi^lU. + (1 + a)||i/(*-i) - fi^W^ 
<a||M(*'i)-/||oo + (l + a)e. 

(12) 

For fc > 0, define Ak to be the set of vectors /i e M'-^I such 
that 

IIm - /lloo < a^Wfi^^'^ - /lloo + (1 + a)e/(l - «). 

We would like to show that for every k > 0, there is a time 
such that /i'*' e y^fc, for all t > tk- We proceed by induction. 

When k = 0, set tk ^ T2. Clearly /i^^^) ^ j^^^^ Assume 
that e Ao, for some t > T2. Then, if {i, j} e Ut, from 

([ig. 



,(*) 



,1 <a||/i(*-i)-/||oo + (l + a)e 



/Hoc 



1 + a 

1 — q; 
1 + Q 



ae + (1 + a)e 
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If {hj}iUu 

Thus, e y^o- By induction, e Aq for all t>T2. 

Now, assume that t^^i exists, for some A: — 1 > 0. Let 
t > tk-i be some time such that {i,j} E Ut- Then, by ( [T2] ) 
and the fact that e ^fc^i. 



,(t) 



^f^-l <a||/i(*-')-A*''lloo + (l + «)e 



l + a 
I — a 



For each G -E, let r^^^ > tk-i be the earliest time after 

tk-i that G U^k . If we set tk to be the largest of these 

ij 

times, we have e Ak, for all t > tk- 
We have established that 



lim sup 

t — >oo 



1-a 



for all fc > 0. Taking a limit as fc — + cx3, we have 



lim sup 11^^*^ - fi^l 



< 



l + a 
1^' 



/3 



Since e was arbitrary, we have the convergence /i^*) 

Part (iii) follows from the fact that Gaussian belief propa- 
gation, when it converges, computes exact means [28], [18], 
[33]. ■ 

Appendix II 
Proofs of Theorems[3]and|4] 

In this section, we will prove Theorems [3] and |4] We will 
start with some preliminary lemmas. 

A. Preliminary Lemmas 

The following lemma provides bounds on and in 
terms of /3. 

Lemma 7: If d = 2, 

2v^- 1/2 <k'^ < 2^, 



If d > 2, 



2V^+i ^ 2v;g + i/2' 



l + {d-l)P ' id-2)P' 
Proof: Starting with the fixed point equation (|7]i, some 

algebra leads to 

The quadratic formula gives us 



k^=^-~^ + J(^--^)\4-^ 



2 2{d-l) y V2 2{d-l)J d-1 
from which it is easy to derived the desired bounds. 



The following lemma offers useful expressions for the fixed 
point and the mode x^. 



Lemma 8: 



(13a) 



T=0 

/3_ V_ 



dk^ 



Proof: If we consider the algorithm when kg = k'^, then 
kt = k^ and jt — 1^ for all i > 0. Then, using (|9| and 
induction, we have 



,(*) _ 

T = 



dk^ 



l + dk!^ 



The result follows from the fact that as < oo, /i^^*^ 

and a:*^*-* x^ (Theorem |2]i. ■ 

The following lemma provides an estimate of the distance 
between fixed points /i^ and /.i^ in terms of [7'^ — 1^ \- 

Lemma 9: Given < /?' < /?, we have 

||/i''-/'||2,„d<T*(7'''-7'')(l + 4/7^). 
Proof: Using ( fT3] l, 

ll/-/'l|2,nd 



00 



2,nd 



< 



Since 



T = 



2,nd 



£/(i_/)-_/'(i_/')-^o, 



we have 



- II 2, rid 



< 



00 00 

r— s—T 



2,nd 



2,nd 



- (7/3')2(i_/')^)^(p-_P>^) 



2,nd 



s=0 
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Hence, we wish to bound the sum 

A = f:i(7'')^(l-7'^r-(7^')^(l-7'''n. 

Set 



s=0 



T = 



log "f^ — log "i^^ 



^ log(l-7'3)~log(l-7/5') 
Note that 

(7^3)2(1 -/)^<(/')2(i_/')^, ifs<r, 

(7^3)2(1 -/)^>(/')2(l_/')^, ifs>T. 

Holding 7'^ fixed, it is easy to verify that T is non-decreasing 
as 7'^ i J^- Hence, 

log 7^^ — log 7^^ 



T < 2 



log(l-7/3)-log(l-7/3') 
log 7^^ — log 7^ 



< lim 2- 

j/l'ljH log(l - 7P) - log(l - 7P ) 

= 2(1 - 7'')/7''. 
Using the above results, 

A = E((7'^'ni-7^'r-(7'^)^(i-7^r) 

00 

+ E ((7'^f(i-7^r-(7^')^(i-7'''r) 



(14) 



Note that, from the definition of jt and ([8]), 7f = /(7t-i). 
Further, from the definition of 'y^ and (j?]), it is clear that 'y^ = 
fij'^)- Since fco < fc*^, then 70 > 7'^, and since kt 1 k'^ (from 
Lemma [ijii)), -ft i 1^ ■ Also, if 7 G [7*^, 1], 

f (^) ^ 'til < 'til 

{1/13 + 7 + d - 1)' " (1//3 + -i0 + d-lf 

Then, by the Mean Value Theorem, 

l7t-7''l = l/(7*-i)-/(7'')l 

< max |/'(7)||7t_i-7^| 

-(i//3+X^-if'"'-^~"'' 

< ^^^ir 

-(l//3 + 7^ + d-lf*'^" ^ 

■ 

The following lemma establishes a bound on the distance 
between a;^*^ and yl in terms of the distance between /i'*^ and 

Lemma 11: 



r.(*) 



m\2,n < It + 1^ + -^Ji^'\\ 



2,nd- 



Proof: First, note that, using ( [T3] l, 



ll/--P*y||2,«d 



2,nd 



=T+1 



/ - 27'^'(1 - j"' y '+' + 27'^(l - 7'^) 
< „ / + 27^' ((1 ~ 7^)^+1 - (1 - 7'3')^+i 
Now, note that if < a < & < 1, for integer ^ > 0, 
b'-a' = b\l- (a/bY) 



v/3^T+l 



r=0 

C30 

00 00 



2,nd 



T=0 



6^(1- a/6) Ew&r 



<(7''fE(i-7^)^ 



T = 



i=0 

<ib^-\b-a) 
<£{b-a). 

Applying this inequality and using ( [T4] i, we have 

A <(/'_/) (l + 2(r+l)7^'') 

<(/'-/)(l + 27'^'(2/7''-l)) 
<(/'_/)(! + 4/7/) 
<(/'_/)(! + 4/^/3), 

which completes the proof ■ 
The following lemma characterizes the rate at which jt | 

Lemma 10: Assume that 7'^ < 70 < 1- Then, {74} is a 
non-increasing sequence and 

^ '- (l//3 + 7^ + d-l)^*- 
Proof: Define the function 



< TT 



0^* 



2,nd 



2,nd 



(15) 



Next, using Theorem [T] Lemma |7] and ( [T5] l, we have 



yl = lim 



lim 



y 



lim Au^ 

AP*y 



Now, 



1 



1 + dkt 
dkt 



l|y-2/l|l2,n 

llV*^-yl||2.n 



/(7) 



1 



d-1 ■ 
1//3+7 



1 + dkt 

< It + \\An^'^ ~ yl\\2,n 

<lt + \\Ati'-'^ -AP*y\\2,n 

By examining the structure of A, it follows from the Cauchy- 
Schwartz Inequality that 

\\A{t,^'^ - P*y)\\2,n <\\f^^'^ - P*y\knd- 
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Thus, using ( [T5] l 

-y^2,n<lt + \\^i^'^ 'P*Vh.nd 

< 7t + IIm^ - P*y\\2,nd + IIa^^*^ - //||2,„d 
<7t + 7''T'^ + llA*'*^-/ll2,„d. 

■ 

B. Proof of Theorem [i] 

Theorem |3] follows immediately from the following lemma. 

Lemma 12: Fix e > 0, and pick (3 so that 

/? > max {(2(1 + r*)/e - 1/2)2/4,9/16} , if d = 2, 
(3 > max {2(1 + T*)/(e(d - 2)),3/(d - 2)} , if d > 2. 

Assume that ko < k^. Define 

' 2 + 9t* (5 + 8^//?) (1/2 + V/?) \ 
e/2 j' 

2 + 4r* (5 + 4(d- l)/3) 
^72 



r = (i + 2y^jiog 

if (i = 2, and 

t* = (l + (d-l)/3)log 



if d > 2. Then, t* is an e-convergence time. 

Proof: Let /3t be the value of /3 implied by kt, that is, 
the unique value such that kt — k^* . Define 

Note that the matrix P is doubly stochastic and hence non- 
expansive under the || • \\2,nd norm. Then, from (|9]l and the 
fact that /i^' is a fixed point. 

At = W^tv + (1 - lt)P^i'^'-^^ - itv - (1 - it)Pti^' h,nd 
<{l~lf')\\^.^'-'^~^.^^h,nd 

<(l-7'')(At_i + ||//-^-/'||2,nd). 

Now, using Lemmas |9] and [TOj 

At < (1 - 7^) (At-i + r*(7t_i - 7t)(l + ^/it)) 

< (1 - 7^) {At-i + T* (7t_i - 7^) (1 + 4/7^^)) (16) 

< (1-7^) (Ai_i+T*a*-i(l + 4/7'3)). 
Here, we define 

ri/(7^+i)2, ifd=2, 

\\/{d-\), xfd>2. 

We would like to ensure that a < 1 — 7'^. For d = 2, some 
algebra reveals that this is is true when < 7^^ < (V^ — l)/2. 
By the fact that /3 > 9/16 and Lemma |7] we have 



0<7^<^^<2/5<^^. 
2V^+1 - ' 2 



For d > 2, using the fact that (3 > 3/{d — 2) and Lemma |7j 

{d - 2)/3 



< 



< 



< 



1-7/3 (d-l)((d-2)/3-l) 



(17) 



By induction using ( [T6] l, we have 

t-i 

A, < (1 - j'^Y + T* (1 + 4/7^ ^(1 - j^Y~ 

s=Q 

<(l-7^)* (l + r*^±^ 
^ ) l-a/(l-7'5). 

Now, notice that using the above results and Lemmas |9] [TO] 
and [11] 

\\x^'^-yih,n 



<7^(l + T*) + (7t-7'') + At 

+ T*(7,-7'3)(l + 4/7^) 
< 7''(1 + r*) + a* 



1 + 4/7/3 



+ <'-^"'"l'"M-a/(l-7'), 

+ rV(l + 4/7^5) 

+ 7''(l+T'^). 

When d = 2, using Lemma [t] and the fact that /3 > (2(1 + 
T*)/e- 1/2)2/4, we have 

(1 + T*)-ff^ < ^* , < e/2. 

Similarly, when d > 2, since /3 > 2(1 + T*)/(e(d - 2)), 

Thus, we will have — j/l||2,n < e if 

(1-7'')* (2+r* (1+4/7^) (i + r^^ 



This will be true when 



'2 + T* (1 + 



-a/(l-7^). 

< e/2. (18) 



l-a/(l-7'*) 



e/2 



2(d- 1) 



< 3/4 < 1. 



(19) 

(We have used the fact that log(l — 7'') < —7''.) To complete 
the theorem, it suffices to show that t* is an upper bound to 
the right hand side of ( fT9| ). 

Consider the d = 2 case. From Lemma |7) it follows that 

l/7''<l + 2y^, 

1 + 4/7^ < 5 + 8^/3. 

Finally, 

1 _ 1 

1 - a/(l ~ 7/3) " 1 - 

_ 1 (1 + 7/^)2(1-7/3) 

~ 1 _ _ (^/3)2 

/j(7'5) 
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Since /3 > 9/16, from Lemma [t] 7'^ G (0, 1/2). It is easy to 
verify that for such j^, the rational function h{'^^) satisfies 
/i(7^) < h{\/2) = 9/2. Thus, 



< 



< 9/2 + 9V/3- 



l-a/(l-7'3) 27/3 
For the d> 2 case, from Lemma [7] it follows that 

1/7^ < l + (rf-l)/3, 
1 + 4// < 5 + 4(d - 1)/?. 

Finally, using ^VJ\ 

1 <^.4 

l-a/(l-7'3) 1-3/4 



C. Proof of Theorem |4] 

Theorem |4] follows immediately from the following lemma. 

Lemma 13: Fix e > 0, and pick f3 so that 

/?> (2(1 + T*)/e- 1/2)2/4, ifd = 2, 

/?> 2(l + T*)/(e(rf-2)), ifd>2. 

Assume that ko — k^, and define 

[(l + 2V;g)log(2/e), ifd=2, 
1 (l + (d-l)/3)log(2/e), ifd>2. 



t* 



Then, t* is an e-convergence time. 

Proof: Note that in this case, we have kt — k^ and 
74 = 7'^, for all i > 0. We will follow the same strategy as 



the proof of Lemma 12 Define 



Note that the matrix P is doubly stochastic and hence non- 
expansive under the || • ||2,rid norm. Then, from (|9| and the 
fact that /i''* is a fixed point. 

At = + (1 - 7^')Pm^*-i) - jPy - (1 - J'')P^l%.,nd 
where the last step follows by induction. 



Now, notice that, using the result and Lemmas 1 1 

yl\\2,n<l^il + r*)+At 



</(i + r*) + (i_y^) 



0\t 



When d = 2, using Lemma [t] and the fact that (3 > (2(1 
T*)/e- 1/2)2/4, we have 



(l + r*)7'5 < 



1 



< e/2. 



2v;g+i/2 

Similarly, when d > 2, since /3 > 2(1 + T*)/{e{d - 2)), 

1+T* 



(1 + T*)7'^ < 



(d-2)/3 



< e/2. 



Thus, we will have — yllU.n < e if 
(l-7'')*<e/2. 
This will be true when 

i> :^log(2/e). 



7' 



(20) 



(We have used the fact that log(l — -f^) < —7'^.) To complete 
the theorem, it suffices to show that t* is an upper bound to 
the right hand side of pO] ). 

Consider the d = 2 case. From Lemma |7) it follows that 

1/7^ < 1 + 2^^. 

For the d > 2 case, from Lemma |7) it follows that 

1/7*3 < l + (rf- l)/3. 



Appendix III 
Proof of Theorem|5] 

Theorem\5\ For the cycle with n nodes, t* < n/\/2. 
Proof: Let e*-' € M^" the vector with {i,j}th 
component equal to 1 and each other component equal to 0. 
It is easy to see that for any S E, 



sup 

t 



r=0 



2,2n 



L"/2J 

^(P--no 



< 



T = 
1 



2,2n 



2V2' 



We then have 



r — sup ■ 



sup ■ 

t,lJ. 



llMll2,2ri 



< sup 



2,2n 



< sup 



/ 2V2II E{»j}/^ye'^'ll2,2n 



2,2n 



= sup - 

- 2V2y'Efe}A*y2^ 

<sup^M^ 

M 2\/2E{,,,}lA^.,|/2n 

< . 

- V2 



